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APPLICATION OF SPEED READEsfG TECHNIQUES 
IN TEXT-TO-SPEECH GENERATION 



FIELD OF THE INVENTION: 

The present invention relates to the conversion of text to speech, and more 
particularly to a method and device for converting text to speech such that playback 
10 duration is decreased without significantly reducing the comprehensibility of the message. 



BACKGROUND OF THE INVENTION: 

Text-to-speech ("TTS") systems facilitate audible dehvery of textual messages. TTS 
systems are useful in situations where accessing textual information may be inconvenient or 
impossible for the user. For example, TTS systems may be used to retrieve electronic mail 
("e-mail") remotely by telephone. 

20 Generally, TTS systems operate by inputting fixed text segments, such as sentences, 

and converting them into speech through a specific algorithm. The particular algorithm 
employed determines the characteristics of the resultant audible speech. Less sophisticated 
TTS systems typically employ simpler conversion algorithms that may generate speech with 
a mechanical or unnatural sound. More advanced systems make use of complex prosody 
algorithms that generate speech which more closely models human speaking patterns in 
terms of intonation, tempo, rhythm and pitch. 

Known TTS systems typically apply a predetermined speaking rate to all generated 
speech based on the designer's preference. This default rate may be perceived by the 
30 listener as being very slow, depending of course on such factors as the familiarity of the 
user with the synthetic voice, the quahty of the transmission medium, and the complexity 
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and predictability of the information being spoken. Excessive playing duration wastes 
valuable time and can result in j&ustration on the part of the listener. 

To address the problem of slow playback, some TTS systems have added a user 
interface that permits the listener to increase the playing speed of the generated speech, hi 
such systems, speech is typically accelerated through a uniform speedup of each synthesized 
word. Hence, important words are accelerated by the same factor as relatively insignificant 
words. This acceleration of key words tends to negatively impact on the user's abihty to 
comprehend them. Disadvantageously, the diminished comprehensibility of the important 
10 words in turn tends to reduce the comprehensibiHty of the overall message. 

Accordingly, what is needed is a method of converting text to speech such that the 
playback duration is decreased while the comprehensibility of the message is not 
significantly reduced. 



SUMMARY OF THE INVENTION: 

It is an object of the present invention to provide a method and device for converting 
20 text to speech such that pla3dng duration is decreased without significantly reducing the 
comprehensibility of the generated speech. 

Briefly, the foregoing and other objects are achieved through an application of 
speed-reading techniques to the TTS conversion process. The human skill of speed-reading 
involves the identification of words that do not contribute to comprehension and the 
accelerated scanning or skipping thereof Similarly, the present invention evaluates words, 
and optionally punctuation, as to importance and certain other characteristics (e.g. word 
length) and processes them differently based on the identified "linguistic profile". In 
particular, words of lesser importance are played at a faster rate or skipped entirely, while 
30 more meaningful words are played at a slower rate. Furthermore, longer words are played 
at a shghtly faster rate than words of average length. In this manner, the comprehensibihty 
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of the most meaningful words in a message is maintained at a high level while the playback 
duration of the message is reduced. 

Li one aspect, there is provided a method of decreasing the pla3n.ng duration of 
speech generated from a text segment comprising counting syllables in each word of said 
text segment and assigning a playing rate indicator to said each word of said text segment 
based on a total number of syllables in said word. 

hi another aspect, there is provided a method of decreasing the playing duration of 
10 speech generated from a text segment, comprising performing a grammatical analysis of 
said text segment and assigning a playing rate indicator to each word of said text segment 
based on said grammatical analysis. 

hi yet another aspect, there is provided a method of decreasing the playing duration 
of speech generated from a text segment comprising comparing each word of said text 
segment to an inventory of pre-selected words and assigning a playing rate indicator to said 
each word of said text segment based on said comparison. 

A computing device and computer readable medium for carrying out the methods of 
20 the invention are also provided. 

Other aspects and features of the present invention will become apparent to those 
ordinarily skilled in the art upon review of the following description of specific 
embodiments of the invention in conjunction with the accompanying figures. 



BRIEF DESCRIPTION OF THE DRAWINGS: 

In figures which illustrate, by way of example, embodiments of the present 
30 invention. 
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FIG. 1 is a schematic diagram illustrating a text-to-speech system exemplary of an 
embodiment of the present invention; 

FIG. 2 is a schematic diagram illustrating the linguistic profiling unit of FIG. 1 in 
greater detail; 

FIG. 3 illustrates an exemplary playing rate indicator ("PRI") array that may be used 
by the linguistic profiling unit of FIG. 2; 

10 FIG. 4 is a schematic diagram illustrating the text-to- speech engine of FIG. 1 in 

greater detail; 

FIGS. 5 A, 5B and 5C are flowcharts illustrating a method exemplary of an 
embodiment of the present invention; 

FIGS. 6 A and 6B illustrate an exemplary instantiation of the PRI array of FIG. 3 
prior to linguistic profiling and following linguistic profiling, respectively; and 

FIGS. 7A and 7B are graphical representations of synthesized speech illustrating the 
20 acceleration in playing duration which may be effected. 

DETAILED DESCRIPTION: 

With reference to FIG.l, a TTS system 10 includes a linguistic profiling unit 12 and 
a TTS engine 14. The TTS system 10 has two inputs, namely, a text segment input 16 and a 
user control information input 24. Inputs 16 and 24 input the subordinate linguistic 
profiling unit 12 of system 10. The TTS system 10 also has a single output 22 suitable for 
carrying synthesized speech from the TTS engine 14. The linguistic profiling imit 12 is 
30 interconnected with the TTS engine 14 by links 18 and 20. The fnst link 18 carries textual 
information while the second link 20 carries playing rate indicator (PRI) information. The 
TTS system 10 is typically a conventional computing device, such as an Intel x86 based or 
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PowerPC-based computer, executing software in accordance with the method as described 
herein. The software may be loaded into the system 10 from any suitable computer 
readable medium, such as a magnetic disk 19, an optical storage disk, a memory chip, or a 
file downloaded from a remote source. 

FIG. 2 illustrates an exemplary architecture of the Unguistic profiling unit 12. The 
role of the linguistic profiling unit 12 is to determine the linguistic profile of each word and 
each element of punctuation in the input text segment. The linguistic profihng unit 12 
includes a controller 30 and four linguistic profiling modules 32, 34, 36, and 38. Each 
1 0 module represents a different technique for identifying words or pauses that may be 

accelerated without significantly reducing the comprehensibility of the message. The four 
linguistic profiling modules in the present embodiment are a pre-selected word inventory 
32; a grammar analysis unit 34; a syllable counter 36; and a punctuation analysis unit 38. 
These four modules are interconnected with the controller 30 by links 42, 44, 46 and 48, 
respectively. 

The pre-selected word inventory 32 is a database of words that have previously been 
identified as being linguistically unimportant with respect to the particular application 
regardless of the context in which they are used. This database may contain prepositions or 
20 diminutive words, for example. Preferably, the pre-selected word inventory 32 may be 
easily configvired to include or exclude words as needed to provide flexibility in adapting 
the invention to a particular application. The pre-selected word inventory 32 is capable of 
receiving words from the controller 30 and outputting match information to the controller 
30 via link 42. 

The grammar analysis unit 34 is a module capable of performing grammatical 
analysis on a text segment. Grammatical analysis typically comprises, at a minimum, the 
identification of the part of speech of each word in the segment, but may also include other 
forms of grammatical analysis. The grammar analysis unit 34 may employ a grammar 
30 analysis engine. Known grammar checking engines, such as Wintertree Software Inc.'s 
"Wgrammar" grammar checker, for example, may be adapted for this purpose. The 
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grammar analysis unit 34 is capable of receiving text segments from the controller 30 and 
outputting grammatical information to the controller 30 via link 44. 

The syllable counter 36 is a module capable of determining the number of syllables 
in a word. Syllable counting may be achieved for example through a breakdown of words 
into phonemes and a subsequent tallying thereof. The syllable counter 36 is capable of 
receiving words from the controller 30 and outputting syllable coxmt information to the 
controller 30 via link 46. 

10 The pxmctuation analysis unit 38 is a module capable of determining the importance 

of the punctuation that follows certain words in a text segment. Punctuation importance is 
typically dependent upon such factors as the importance of preceding or succeeding words, 
the type of punctuation, and the like. The punctuation analysis imit 38 is capable of 
receiving text segments from the controller 30 and outputting punctuation importance 
information to the controller 30 via link 48. Note that pimctuation analysis is not a key 
aspect of this invention, therefore the punctuation analysis unit 38 may be omitted in some 
embodiments. 

The controller 30 is responsible for overseeing the linguistic profiling process within 
20 the hnguistic profiling unit 12. The controller 30 implements a number of alternative 
"linguistic profiling strategies" or operating modes which govern the method by which 
playing rate indicator (PRI) values associated with words and punctuation in a text segment 
are ascertained. The active strategy determines which of the modules 32, 34, 36, and 38 
will be employed in the PRI assignment process, and how they will be employed. 
Strategies are user-selectable via input 24 to the confroller 30. 

Table I below provides a representation of two exemplary linguistic profiling 
strategies that may be implemented by the controller 30. The first strategy, Strategy A, is 
relatively simple, requiring only that the words of the text segment be compared against 
30 entries in the pre-selected word inventory 32. That is, according to Strategy A, a controller 
30 processing a text segment will only increment the PRI of a word (i.e. change the PRI 
value to indicate a faster playing rate for the word) if the word matches an entry in the pre- 
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selected word inventory 32. The second strategy. Strategy B, is more complex. Strategy B 
employs each of the four modules 32, 34, 36 and 38 in the linguistic profiling process. As 
indicated in Table I, a controller 30 processing a text segment in accordance with Strategy B 
will increment a word's PRI either when the word matches an entry in the pre-selected word 
inventory 32, or when the word is identified to be a preposition by the grammar analysis 
unit 34. Furthermore, if the word is determined to have four or more syllables by the 
syllable counter 36, the word's PRI will be set to a "long" value regardless of its previous 
PRI. This aspect of Strategy B distinguishes long words, which will be accelerated only 
slightly in accordance with typical human speaking patterns, from standard words, which 
10 may be accelerated to a greater degree. In addition, according to Strategy B, a controller 30 
will increment the PRI of each element of punctuation identified as a comma (in order to 
shorten the pause associated with commas) and decrement that of each element of 
punctuation identified as a period (to effect a greater pause duration at the end of sentences). 



Linguistic 

Profiling 

Strategy 


Pre-selected Word 
Matching? 


Grammar 
Analysis? 


Syllable 
Counting? 


Punctuation 
Analysis? 


Strategy A 


ON: increment PRI 
of matched words 


OFF 


OFF 


OFF 


Strategy B 


ON: 

increment PRI of 
matched words 


ON: 

increment 
PRI of 
prepositions 


ON: flag words 
having 4+ 
syllables as 
"long" 


ON: increment PRI 
associated with 
commas; decrement 
PRI associated with 
periods. 



TABLE I: Linguistic Profiling Strategies 



The controller 30 develops a PRI data array for each text segment being processed 
within linguistic profiling unit 12. An exemplary PRI array 60 is illustrated in FIG. 3. Each 
element of the array 60 represents a word or element of punctuation in the text segment, and 
20 contains an enumerated value representing the PRI of the corresponding word or element of 
punctuation. In the present embodiment, there are three enumerated PRI values for words 
and punctuation: "slow", "normal", and "fast". An additional value of "long" is used in 
association with long words (i.e. words having a high syllable count). 
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An exemplary architecture of the TTS engine 14 is shown in FIG. 4. The TTS 
engine 14 is responsible for converting input text segments and PRI information into 
audible speech. It should be appreciated that many aspects of this structure are well known 
to those skilled in the art and are described, for example, in US Patent No. 5,774,854, the 
contents of which are incorporated by reference herein. 

The TTS engine 14 contains a linguistic processor 50 and an acoustic processor 52 
that are interconnected. The linguistic processor 50 is capable of converting input text and 
PRI information into a series of phonemes, pitch and d^iration values. The linguistic 
10 processor 50 includes a duration assignment unit (not shown) which allows the duration of 
words and pauses associated with punctuation to be adjusted in accordance with their 
associated PRI. The linguistic processor 50 may additionally include such sub-components 
as a text tokenizer; a word expansion unit; a syllabification unit; a phonetic transcription 
unit; a part of speech assignment unit; a phrase identification unit; and a breath group 
assembly unit, depending upon the complexity of the employed text-to-speech algorithm. 

The acoustic processor 52 is a module capable of converting a received sequence of 
phonemes, pitch and duration values into sounds comprising audible speech. The acoustic 
processor 52 typically includes such sub-components as a diphone identification unit; a 
20 diphone concatenation unit; a pitch modifier; and an acoustic transmission unit. 

The operation of the present embodiment is illustrated in FIGS. 5A, 5B and 5C, with 
additional reference to FIGS. 1, 2, 6A, 6B, 7A and 7B. It is worth noting that the text-to- 
speech conversion process is broken into two phases. The first phase is the linguistic 
profiling phase, during which input text segment and user control data are converted into 
text and PRI information. This phase spans steps S502 to S558 in FIGS. 5A to 5C and takes 
place within the linguistic profiling unit 12. The second phase is the speech generation 
phase, during which the text and PRI information are converted into audible speech. The 
second phase spans steps S560 to S562 in FIG. 5C and takes place within the TTS engine 
30 14. 
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In the first phase, a text segment is input to the TTS system 10 and is received by the 
controller 30 in step S502 (FIG. 5A). In the present example, the received data consists of 
the text segment "The motorcycle is in the garage." In response to this input, the controller 
30 initializes a PRI array 660 corresponding to the text segment (FIG 6A). This step 
typically requires the input text segment to be processed into tokens, or units roughly 
corresponding to words and punctuation but possibly including other linguistic constructs 
such as abbreviations, numbers or compound words. In the present example, seven tokens 
(six words and one element of punctuation) are identified. Accordingly, the array 660 has 
seven elements. The first six elements of the array correspond to the six words in the text 
10 segment, while the seventh element corresponds to the punctuation (a period) after the sixth 
word in the text segment. A default PRI value of "normal" is assigned by the controller 30 
to each word and element of punctuation (S504), such that the initial state of the array is as 
shown in FIG. 6A. 

Next, in step S506 the controller 30 reads the user control input 24 in order to 
determine which of the two alternative linguistic profiling strategies. Strategy A or Strategy 
B, should be employed in the text-to-speech conversion process. In the present example, it 
is assumed that the user has selected Strategy B, as described in Table I above, as the active 
strategy. 

20 

The subsequent steps of the Hnguistic profiling phase involve the controller 30 
interacting with the various linguistic profiling modules 32, 34, 36, and 38, in accordance 
with the active strategy, in order to effect changes to the PRI array 660 that reflect the 
ascertained importance and linguistic characteristics of the associated words and 
punctuation in the text segment. 

In step S508, the controller 30 examines the active strategy (Strategy B) to 
determine whether or not pre-selected word matching is required. Because Strategy B in the 
present example does in fact include pre-selected word matching, the controller 30 proceeds 
30 to interact with the pre-selected word inventory 32 via Unk 42 (FIG. 2) in order to determine 
whether any of the words in the text segment are contained therein. In the present example, 
it is assumed that the pre-selected word inventory 32 has been previously configured to 
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include entries for the words "A", "AND", and "THE". Interaction between the controller 
30 and the pre-selected word inventory 32 in steps S10-S518 reveals that the first word 
"The" and fifth word "the" of the text segment match an entry "THE" in the inventory. 
Accordingly, controller 30 increments the enumerated PRI value of the first and fifth 
elements in array 660 (FIG. 6B) fi-om their default value of "normal" to "fast" (S514), 
thereby reflecting the reduced importance of the first and fifth word of the text segment. 

Next, in step S520 (FIG. 5B), the controller 30 examines the active strategy to 
determine whether or not grammatical analysis is required. Because Strategy B in the 

10 present example does in fact include grammatical analysis, in step S522 the controller 30 
proceeds to pass the text segment to the module 34 via link 44 (FIG. 2) for grammatical 
analysis. The grammar analysis imit 34 performs grammatical analysis in accordance v^th 
the active strategy, which dictates that the analysis is to consist of the identification of the 
part of speech of each word in the text segment. Upon the completion of the analysis, the 
unit 34 communicates the results to the controller 30 via link 44. The controller 30 
examines the results of the analysis for each word (S524-S530) in accordance with the 
active strategy, which fiirther dictates that only prepositions are to have their PRI value 
incremented. The examination reveals that the word "in" in the fourth ordinal position of 
the input text segment has been identified as a preposition. Accordingly, the controller 30 

20 increments the associated PRI value in the fourth element of array 660 (FIG. 6B) from 
"normal" to "fast" in step S528, thereby reflecting the reduced importance of this word. 

Next, in step S540, the controller 30 examines the active strategy to determine 
whether or syllable counting is required. This examination reveals that syllable counting is 
in fact necessary, and moreover, in accordance with Strategy B, that words having four or 
more syllables are to be flagged as "long" words. Accordingly, the controller 30 proceeds 
to interact with the syllable coiinter 36 in steps S542-S548 in order to determine the syllable 
count of each word in the text segment. This interaction reveals that the second word in the 
text segment, "motorcycle", does in fact have four syllables and should therefore be flagged 
30 as a "long" word. Thus, the controller 30 changes the enumerated value associated with the 
word "motorcycle", that is, the value in the second ordinal position of array 660 (FIG. 6B), 
fi-om "normal" to "long" in step S546. 
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Subsequently, in step S550 (FIG. 5C), the controller 30 examines the active strategy 
to determine whether or not pimctuation analysis is required. This examination reveals that 
punctuation analysis is in fact necessary, and moreover, that in accordance with Strategy B, 
commas are to have their PRI incremented, and periods are to have their PRI decremented. 
As a result, the controller 30 proceeds to interact with the punctuation analysis unit 38 in 
step S552-S558 to determine whether pause adjustment is required for any of the 
pimctuation in the text segment. This interaction reveals that the period following the last 
word in the text segment ("garage") should have its PRI decremented. Accordingly, the 
10 controller 30 decrements the enumerated PRI value associated with the final pause, that is, 
the value in the seventh ordinal position of the array 660 (FIG. 6B), from "normal" to 
"slow" in step S556. 

Hence, at the completion of phase 1, the contents of the PRI array 660 are as shown 
in FIG. 6B. At this stage the PRI array as well as the input text segment are communicated 
from the linguistic profiling unit 12 to the TTS engine 14. 

Turning to phase 2, and with additional reference to FIG. 4, the hnguistic processor 
50 of TTS engine 14 receives the text segment and PRI information via links 18 and 20, 
20 respectively, and proceeds to convert the input text segments to a sequence of phonemes, 
pitch and duration values. Duration is assigned'to words and punctuation by the duration 
assigimient unit of the linguistic processor 50 in accordance with the associated PRIs in 
array 660. Specifically, the duration of each word and each element of punctuation may be 
assigned as indicated in Table II below. 



WORDS AND PUNCTUATION 


PRI 


Assigned Duration 


Fast 


Default duration x 0.5 


Normal 


Default duration 


Slow 


Default duration x 1 .5 


Long 


Default duration x 0.75 



TABLE II: Duration Assignment 
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Aside from duration assignment, various other steps may be performed by the 
linguistic processor 50, including text tokenization; word expansion; syllabification; 
phonetic transcription; part of speech assignment; phrase identification; and breath group 
assembly, as have been described in the prior art. The exact scope of the processing 
performed by the linguistic processor 50 is dependent upon the complexity of the adopted 
TTS conversion algorithm. The resulting series of phonemes, pitch and duration values are 
then passed to the acoustic processor 52. 

The acoustic processor 52 converts the received series of phonemes, pitch and 
duration values into audible speech. As described in the prior art, this conversion typically 
involves the steps of diphone identification, diphone concatenation, pitch modification and 
acoustic transmission, however, it may alternatively consist of other steps, depending upon 
the employed TTS algorithm. Generated speech is provided to the output 22 of the overall 
TTS system 10. 

A graphical representation of the decrease in playing duration effected by the 
present embodiment is provided in FIGS. 7A and 7B. FIG. 7A represents the playing of the 
exemplary text segment "The motorcycle is in the garage." as audible speech at the default 
rate, without any acceleration. That is, FIG. 7A corresponds with an array 660 having a PRI 
of "normal" in each of its elements (i.e. similar to FIG. 6 A) at the conclusion of the 
linguistic profiling phase. FIG. 7B, on the other hand, represents the playing of the same 
text segment after it has been accelerated in accordance with Strategy B and the acceleration 
factors of Table II. In other words, FIG. 7B corresponds with a PRI array 660 having the 
values illustrated in FIG. 6B at the conclusion of the linguistic profiling phase. Note that 
solid borders within FIGS. 7A and 7B indicate audible words while dashed borders indicate 
pauses. Each unit on the horizontal axis represents a fixed unit of time of 0.1 seconds. 

In FIG. 7A, it can be seen that default playing duration for the exemplary text 
segment, without acceleration, is 3.2 seconds. After being processed by the preferred 
embodiment as described above, however, the playing duration is reduced to 2.5 seconds, as 
illustrated in FIG. 7B. Note that only the underlined words have been accelerated, with a 
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dotted underline indicating a lesser degree of acceleration associated with a long word. 
Advantageously, the playing duration has been reduced by 0.7 seconds or approximately 
22%, yet the comprehensibihty of the message has not been significantly reduced since such 
key words as "garage" have been maintained at their default rate, or have been accelerated 
only slightly (e.g. "motorcycle") in accordance with the active Strategy B. 

The potential modifications to the above-described embodiment are many. 
Significantly, the TTS system 10 may be implemented on multiple computing devices rather 
than just one. For example, the linguistic profiling unit 12 may be implemented on a first 
10 computing device and the TTS engine 14 may be implemented on a second computing 
device. 

As well, a person skilled in the art will recognize that the linguistic profiling xmit 12 
may have various alternative organizations. The number of linguistic profiling modules 
may be greater than or less than four, depending upon the number and type of techniques 
employed to accelerate speech within the application. In cases where the number of 
linguistic profiling modules is greater than four, techniques other than the ones described 
may be employed to determine the importance of words or pauses in the text segment. 
Also, the allotment of processing as between the controller 30 and the various linguistic 
20 profiling modules may be different than described. For example, the linguistic profiling 
modules may be responsible for making changes to the FRI array 60 directly instead of the 
controller 30. Fundamentally, the controller 30 and the various linguistic profiling modules 
may not in fact be distinct. Listead, controlling activities and linguistic profiling activities 
may be merged within the linguistic profiling unit 12. 

The number and scope of linguistic profiling strategies may also differ. For 
example, in some embodiments, the invention may employ only a single, fixed strategy for 
linguistic profiling that is tailored to the particular application. Alternatively, in cases 
where multiple strategies exist, the active strategy maybe automatically selected by the 
30 TTS system 10 based on the characteristics of the input data, rather than being user- 
selectable. Furthermore, the scope of linguistic profiling strategies may be broader or 
narrower than the scope of the strategies described in Table I, in terms of the manner in 
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which the array 60 is manipulated. For instance, a different strategy could require, among 
other things, that words with three or more syllables (rather than four or more) be flagged as 
"long" words. Some strategies may involve the wholesale skipping of certain words of 
lesser importance to promote greater acceleration of playing speed. Alternatively, other 
strategies may prohibit the skipping or even the acceleration of certain parts of speech that 
are typically central to the comprehensibility of the message, such as nouns. 

Various approaches may also be taken towards the structure and maintenance of PRI 
information associated with a given text segment. For example, PRI information may be 
10 represented by means of an alternative data structure, such as a linked list, rather than as an 
array. Moreover, the range of potential PRI values for a word or element of punctuation 
may be greater than or less than the four enumerated values of the present embodiment, to 
support greater or lesser granularity in the available degrees of speedup (respectively). PRI 
values may also be expressed numerically. Conveniently, numerical values that match 
corresponding acceleration or deceleration factors in the duration assignment unit may be 
employed. Finally, PRI information may be merged with textual data rather than being 
separately maintained. In that case, one link may be sufficient to communicate text and PRI 
information between the linguistic profihng imit 12 and the TTS engine 14. 

20 It is also worth noting that the acceleration and/or deceleration factors applied by the 

duration assigoment unit may be different than the exemplary factors of 0.5, 0.75 and 1.5 
shown in Table II. Ideally, these factors are easily modifiable to support greater flexibility 
in adapting the present invention to a particular application. 

Lastly, a person skilled in the art will recognize that significant gains in efficiency, 
both in terms of the effort required to implement the invention and in run-time processing, 
may be realized through the elimination of redundancies in the described embodiment, 
especially as between the linguistic profiling unit 12 and the TTS engine 14. For example, a 
common phoneme generator may be employed both for the purposes of syllable counting 
30 within the linguistic profiling unit 12 and speech generation within the TTS engine 14. As 
another example, tokens may be passed fi-om the linguistic profiling unit 12 to the TTS 
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engine 14 instead of raw text to avoid possible duplication in tokenization processing in the 
latter stage. 

The foregoing is merely illustrative of the principles of the invention. Those skilled 
in the art will be able to devise numerous arrangements which, although not explicitly 
shown or described herein, nevertheless embody those principles that are within the spirit 
and scope of the invention, as defined by the claims. 
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WHAT IS CLAIMED IS: 



1 . A method of decreasing the pla3dng duration of speech generated from a text 
segment, comprising: 

(a) counting syllables in each word of said text segment; and 

(b) assigning a playing rate indicator to said each word of said text segment based on a 
total number of syllables in said word. 

2. The method of claim 1 , further comprising generating speech from said text segment 
such that a playing rate of a generated word is according to said playing rate indicator. 

3 . The method of claim 2, wherein said playing rate of a given generated word is 
increased where the playing rate indicator of said word is indicative of a higher number of 
syllables and slowed where the playing rate indicator of said word is indicative of a lower 
number of syllables. 

4. The method of claim 3, further comprising decreasing the duration of pauses 
associated with selected punctuation in said text segment. 

5. The method of claim 1, wherein said playing rate indicator of said each word is 
changed when a syllable count of said each word increases above a threshold number of 
syllables. 

6. A method of decreasing the playing dixration of speech generated from a text 
segment, comprising: 

(a) performing a grammatical analysis of said text segment; and 

(b) assigning a playing rate indicator to each word of said text segment based on said 
grammatical analysis. 
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7. The method of claim 6, further comprising generating speech from said text segment 
such that a playing rate of a generated word is according to said playing rate indicator. 

8. The method of claim 1, further comprising decreasing the duration of pauses 
associated with selected punctuation in said text segment. 

9. The method of claim 8, wherein said grammatical analysis comprises the 
identification of a part of speech of the words in the text segment. 

10. The method of claim 9, wherein said playing rate indicator of said each word is set 
to reflect a slow playing rate for certain parts of speech and a fast playing rate for other 
parts of speech. 

1 1 . The method of claim 1 0, wherein said certain parts of speech comprise nouns. 

12. The method of claim 1 1 , wherein a word with a pla3dng rate indicator indicative of a 
slow playing rate is omitted from the generated speech. 

13. A method of decreasing the playing duration of speech generated from a text 
segment, comprising: 

(a) comparing each word of said text segment to an inventory of pre-selected words; 
and 

(b) assigning a plajdng rate indicator to said each word of said text segment based on 
said comparison. 

14. The method of claim 13, ftirther comprising generating speech from said text 
segment such that a pla)dng rate of a generated word is according to said playing rate 
indicator. 
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1 5 . The method of claim 1 4, further comprising decreasing the dxiration of pauses 
associated with selected punctuation in said text segment. 

16. The method of claim 15, wherein said playing rate indicator of each word is set to 
reflect a slow playing rate when said each word matches an entry in said inventory. 

17. The method of claim 16, further comprising omitting from the generated speech a 
word with a playing rate indicator indicative of a slow playing rate. 

18. A computing device comprising: 

(a) a processor; 

(b) persistent storage memory in communication with said processor, storing processor 
readable instructions adapting said device to: 

(i) receive a text segment; 

(ii) count syllables in each word of said text segment; and 

(iii) assign a playing rate indicator to said each word of said text segment 
based on a total number of syllables in said word. 

19. The computing device of claim 17, wherein said processor readable instructions 
further adapt said device to: 

(iv) generate speech from said text segment such that a playing rate of a 
generated word is according to said playing rate indicator. 

20. A computing device comprising: 

(a) a processor; 

(b) persistent storage memory in commTinication with said processor, storing processor 
readable instructions adapting said device to: 

(i) receive a text segment; 

(ii) perform a grammatical analysis of said text segment; and 

(iii) assign a playing rate indicator to each word of said text segment based on 
said grammatical analysis. 
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21 . The computing device of claim 1 9, wherein said processor readable instructions 
further adapt said device to: 

(iv) generate speech from said text segment such that a playing rate of a 

generated word is according to said playing rate indicator. 

22. A computing device comprising: 

(a) a processor; 

(b) persistent storage memory in communication with said processor, storing processor 
readable instructions adapting said device to: 

(i) receive a text segment; 

(ii) compare each word of said text segment to an inventory of pre-selected 
words; and 

(iii) assign a playing rate indicator to said each word of said text segment 
based on the results of said comparison. 

23. The computing device of claim 21 , wherein said processor readable instructions 
further adapt said device to: 

(iv) generate speech from said text segment such that a playing rate of a 
generated word is according to said playing rate indicator. 

24. A computer readable medium storing computer software that, when loaded into a 
computing device, adapts said device to: 

(a) receive a text segment; 

(b) count syllables in each word of said text segment; and 

(c) assign a playing rate indicator to said each word of said text segment based on a 
total number of syllables in said word. 

25. The computer readable medium of claim 23, wherein said computer software fiirther 
adapts said device to: 
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(d) generate speech from said text segment such that a playing rate of a generated word 
is according to said playing rate indicator. 

26. A computer readable medium storing computer software that, when loaded into a 
computing device, adapts said device to: 

(a) receive a text segment; 

(b) perform a grarmnatical analysis of said text segment; and 

(c) assign a playing rate indicator to each word of said text segment based on said 
grammatical analysis. 

27. The computer readable medium of claim 25, wherein said computer software fiirther 
adapts said device to: 

(d) generate speech from said text segment such that a playing rate of a generated word 
is according to said playing rate indicator. 

28. A computer readable medium storing computer software that, when loaded into a 
computing device, adapts said device to: 

(a) receive a text segment; 

(b) compare each word of said text segment to an inventory of pre-selected words; and 

(c) assign a playing rate indicator to said each word of said text segment based on the 
results of said comparison. 

29. The computer readable medium of claim 27, wherein said computer software fiirther 
adapts said device to: 

(d) generate speech from said text segment such that a playing rate of a generated word 
is according to said playing rate indicator. 
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ABSTRACT 



A method and device for converting text to speech such that playback 
diiration is decreased while the comprehensibility of the generated speech is not 
significantly reduced is disclosed. A text segment initially undergoes linguistic profiling 
wherein a playing rate indicator for each word, and optionally each element of punctuation, 
is determined. The playing rate indicator is set to reflect the importance of the associated 
word or element of punctuation as ascertained through an application of speed-reading 
techniques, such as matching against a pre-selected word inventory, grammatical analysis, 
or punctuation analysis. As well, the playing rate indicator may reflect certain linguistic 
characteristics of the associated word, such as its length. The text is subsequently converted 
to speech by a text-to-speech engine capable of varying the playing speed of each word, and 
each pause associated with punctuation, in the text segment according to the corresponding 
playing rate indicator of the word or element of punctuation. 
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Attorney Docket No. 91436-220 



IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 

COMBINED DECLARATION AND POWER OF ATTORNEY 

As a below named inventor, I hereby declare that: my residence, post office address and citizenship are as stated 
below next to my name; that I verily beheve that I am the original, first and sole inventor (if only one name is 
listed below) or a joint inventor (if plural inventors are named below) of the subject matter which is claimed 
and for which a patent is sought on the invention entitled: 

APPLICATION OF SPEED READING TECHNIQUES IN TEXT-TO-SPEECH GENERATION 



the specification of which 

(check one) IS is attached hereto. 

□ was filed on 

as U.S. AppHcation Serial No. . 

□ was filed on 

as PCT International Application No. PCT / . 

and (if applicable) was amended on . 

I hereby state that I have reviewed and understand the contents of the above identified specification, including 
the claims, as amended by any amendment referred to above. 

I acknowledge the duty to disclose information known to me which is material to the examination of this 
application in accordance with Title 37, Code of Federal Regulations, §§ 1.56(a) and (b), which state: 

"(a) A patent by its very nature is affected with a public interest. The public interest is best served, 
and the most effective patent examination occurs when, at the time an appHcation is being examined, 
the Office is aware of and evaluates the teachings of all information material to patentability. Each 
individual associated with the filing and prosecution of a patent application has a duty of candor and 
good faith in dealing with the Office, which includes a duty to disclose to the Office all information 
known to that individual to be material to patentability as defined in this section. The duty to disclose 
information exists with respect to each pending claim until the claim is cancelled or withdrawn from 
consideration, or the apphcation becomes abandoned. Information material to the patentability that is 
cancelled or withdrawn from consideration need not be submitted if the information is not material to 
the patentability of any claim remaining under consideration in the application. There is no duty to 
submit information which is not material to the patentability of any existing claim. The duty to disclose 
all information known to be material to patentability is deemed to be satisfied if all information known 
to be material to patentability of any claim issued in a patent was cited by the Office or submitted to 
the Office in the manner prescribed by §§ 1 .97(b)-(d) and 1 .98. However, no patent will be granted 
on an application in connection with which fraud on the Office was practised or attempted or the duty 
of disclosure was violated through bad faith or intentional misconduct. The Office encourages 
applicants to carefully examine: 

(1) prior art cited in search reports of a foreign patent office in a counterpart application. 
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(2) the closest information over which individuals associated with the filing or prosecution of 
a patent application believe any pending claim patentably defines, to make sure that any 
material infonnation contained therein is disclosed to the Office. 

(b) Under this section, information is material to patentability when it is not cumulative to 
infonnation already of record or being made of record in the application, and 

(1) It establishes, by itself or in combination with other information, a prima facie case of 
unpatentability of a claim; or 

(2) It refutes, or is inconsistent with, a position the applicant takes in: 

(i) Opposing an argument of unpatentability relied on by the Office, or 

(ii) Asserting an argument of patentability. 

A prima facie case of unpatentability is established when the information compels a conclusion that a 
claim is unpatentable under the preponderance of evidence, burden-of-proof standard, giving each term 
in the claim its broadest reasonable construction consistent with the specification, and before any 
consideration is given to evidence which may be submitted in an attempt to establish a contrary 
conclusion of patentability." 

I hereby claim foreign priority benefits under 35 United States Code, § 119 and/or § 365 of any foreign 
application(s) for patent or inventor's certificate listed below and have also identified below any foreign 
application for patent or inventor's certificate filed by me or my assignee disclosing the subject matter claimed 
in this application and having a filing date (1) before that of the application on which priority is claimed, or (2) 
if no priority claimed, before the filing of this apphcation: 

PRIOR FOREIGN AFPLICATIONf S) 

Date First Date 
Filing Date Laid-open or Patented Priority 

Number Country (Day/Month/Year) PubHshed or Granted Claimed? 

N/A 



I hereby claim the benefit under 35 United States Code, § 1 19(e) of any United States provisional application(s) 
listed below: 

Application Number Filing Date 

N/A 

I hereby claim the benefit under Title 35, United States Code, §120 of any United States apphcation(s) listed 
below and, insofar as the subject matter of each of the claims of this application is not disclosed in the prior 
United States apphcation in the manner provided by the first paragraph of Title 35, United States Code, § 112, 
I acknowledge the duty to disclose information which is material to patentability as defined in Title 37, Code 
of Federal Regulations, §1. 56(a) which became available between the filing date of the prior apphcation and 
the national or PCT international filing date of this application: 

PRIOR U.S. OR PCT APPLICATION(S) 

Application No. Filing Date Status 

(day/month/year) (pending, abandoned, granted) 



N/A 



-3 - 



I hereby declare that all statements made herein of my own knowledge are true and that all statements made 
on information and belief are believed to be true; and further that these statements were made with the 
knowledge that wilful false statements and the like so made are punishable by fine or imprisonment, or both, 
under Section 1 00 1 of Title 1 8 of the United States Code and that such wilful false statements may jeopardize 
the validity of the application or any patent issued thereon. 

I hereby appoint the following patent agents with fiiU power of substitution, association and revocation to 
prosecute this application and/or international application and to transact all business in the Patent and 
Trademark Office connected therewith: 



JOHN R. MORRISSEY (Reg. No. 28585) 
KELTIE R. SIM (Reg. No. 34535) 
ALISTAIR G. SIMPSON (Reg. No. 37040) 
MATTHEW ZISCHKA (Reg. No. 41575) 
YWE LOOPER (Reg. No. 43,758) 



GUNARS GAIKIS (Reg. No. 3281 1) 
RONALD D. FAGGETTER (Reg. No.33345) 
YOON KANG (Reg. No. 40386) 
JONATHAN D. CUTLER (Reg. No. 40576) 



PLEASE SEND CORRESPONDENCE TO: SMART & BIGGAR 

438 University Avenue 
Suite 1500, Box 111 
Toronto, Ontario 
Canada MSG 2K8 

Telephone: (416)593-5514 
Facsimile: (416) 591-1690 

Attention: Ronald D. Faggetter 



INVENTOR'S SIGNATURE: 
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Inventor's Name: Conal Walsh 

(First) (Middle Initial) (Family Name) 

Country of Citizenship: Australia 



Residence: Toronto, Ontario, Canada 



(City, Province, Country) 

Post Office Address: 41 Hoyle Avenue, Toronto, Ontario M4S 2X5, Canada 
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